Interactive Visualisation

Column 1

Info

The sank of RMS Titanic in the North Atlantic Ocean on 15 April 1912, after striking an iceberg during her voyage from Southamoton, South East England to New York City. Estimated 2,224 passengers and crew on the boat, and more than 1500 died.

The dataset used in this analysis has only 1309 rows of passengers’ information. Therefore, any results from this analysis should be treated as estimate.

Death Statistics (Wikipedia)

Column 2

Passenger Count

1309

Number of Males

843

Number of Females

466

Kids < 12

102

teen 12 - 19

105

adult 19 - 65

1054

elder > 65

48

Column 3

Ticket Classes

Ticket Prices

Chart 4

Data Table

About

Reference

Dave Langer 2017, Intro to Machine Learning with R & caret, Data Science Dojo, Viewed 22 October 2021, https://www.youtube.com/watch?v=z8PRU46I3NY&t=1492s

Kaggle 2021, Titanic - Machine Learning from Disaster, viewed 22 October 2021, https://www.kaggle.com/c/titanic/data?select=gender_submission.csv

“Untergang der Titanic”, By Willy Stöwer - Magazine Die Gartenlaube, en:Die Gartenlaube and de:Die Gartenlaube, Public Domain, https://commons.wikimedia.org/w/index.php?curid=97646

---
title: "Titanic Analysis"
output: 
  flexdashboard::flex_dashboard:
    orientation: column
    vertical_layout: fill
    storyboard: true
    social: ["linkedin", "twitter", "facebook", "pinterest", "menu"]
    source_code: embed
    theme: readable
---



```{r setup, include=FALSE}

# R Libraries


library(flexdashboard)
library(tidyverse)
library(skimr)
library(caret)
library(DT)
library(plotly)

```


```{r}
# Data import

#train <- read.csv("train.csv")
#test <- read.csv("test.csv")

##### Combine datasets

#train <- train %>% 
#  relocate(Survived, .after = Embarked) %>% 
#  mutate(source = "train")
    
#test <- test %>% 
#  mutate(source = "test")
    
#titanic <- full_join(train, test) 

##### Data Cleaning

#titanic_c <- titanic %>% 
#  dplyr::select(-PassengerId, -Name, -Ticket, -Cabin) %>% 
#  mutate_if(is.character, as.factor) %>% 
#  mutate(Pclass = as.factor(Pclass),
#         Survived = as.factor(Survived),
#         family_size = SibSp + Parch + 1) %>% 
#  relocate(family_size, .after = Parch)

##### Fill up missing values in Fare

#titanic_c <- titanic_c %>% 
#  mutate(Fare = replace_na(Fare, median(titanic_c$Fare, na.rm = T)))

##### Fill up missing values in Embarked with most frequently occur levels

#titanic_c$Embarked[titanic_c$Embarked == ""] <- "S"

##### Replace NA in Age with imputation model 

#dummy_formula <- dummyVars(~., data = titanic_c[, -9])
#titanic_c_dummy <- dummy_formula %>% predict(titanic_c[, -9])

##### Impute with Bagged tree models

#BagImpute_formula <- titanic_c_dummy %>% preProcess(method = "bagImpute")
#imputed.data <- BagImpute_formula %>% predict(titanic_c_dummy)

##### Extra Age from the dummy

#titanic_c$Age <- imputed.data[, 6]

#write.csv(titanic, "titanic_full.csv")

```



```{r}
# Data import

titanic <- read.csv("titanic_full.csv")
titanic_c <- read.csv("titanic_c.csv")

# Data cleaning

titanic <- titanic %>% dplyr::select(-X)  # Remove the row number variable "X"

titanic_c <- titanic_c %>% 
  dplyr::select(-X) %>%       # Remove the row number variable "X"
  mutate(Age = round(Age),
         Pclass = as.factor(Pclass),
         Survived = as.factor(Survived)) %>% 
  mutate_if(is.character, as.factor) 




```



Interactive Visualisation
===========================

Column 1 {data-width=400}
---------------------------


![](\Users\karho\Desktop\R\github\titanic\Titanic_fallen.jpg){width=80%}
### Info The sank of RMS Titanic in the North Atlantic Ocean on 15 April 1912, after striking an iceberg during her voyage from Southamoton, South East England to New York City. Estimated 2,224 passengers and crew on the boat, and more than 1500 died. The dataset used in this analysis has only 1309 rows of passengers' information. Therefore, any results from this analysis should be treated as estimate. ### Death Statistics (Wikipedia) ```{r} gauge(1500, min = 0, max = 2224, gaugeSectors(colors = "red")) ``` Column 2 {data-width=100} --------------------------- ### Passenger Count ```{r} passenger_count <- count(titanic_c) valueBox(passenger_count, icon = "fa-users") ``` ### Number of Males ```{r} male <- titanic_c %>% filter(Sex == "male") %>% count() valueBox(male, icon = "fa-mars", color = "grey") ``` ### Number of Females ```{r} female <- titanic_c %>% filter(Sex == "female") %>% count() valueBox(female, icon = "fa-venus", color = "grey") ``` ### Kids < 12 ```{r} titanic_c <- titanic_c %>% mutate(age_group = case_when(Age < 12 ~ "kid", Age > 12 & Age < 19 ~ "teen", Age > 19 & Age < 65 ~ "adult", TRUE ~ "elder"), age_group = factor(age_group, levels = c("kid", "teen", "adult", "elder"))) kid <- titanic_c %>% filter(age_group == "kid") %>% count() valueBox(kid, color = "orange") ``` ### teen 12 - 19 ```{r} teen <- titanic_c %>% filter(age_group == "teen") %>% count() valueBox(teen, color = "orange") ``` ### adult 19 - 65 ```{r} adult <- titanic_c %>% filter(age_group == "adult") %>% count() valueBox(adult, color = "orange") ``` ### elder > 65 ```{r} elder <- titanic_c %>% filter(age_group == "elder") %>% count() valueBox(elder, color = "orange") ``` Column 3 {data-width=500} ---------------------------- ### Ticket Classes ```{r} # set up df tc <- titanic_c tc_class <- tc %>% group_by(Pclass) %>% summarise(count = n()) # plot p1 <- ggplot(tc_class, aes(x = Pclass, y = count, fill = Pclass)) + geom_bar(stat = "identity") + theme_bw() + theme(plot.title = element_text(face = "bold"), legend.position = "none") + labs(x = "Ticket class", y = "Passenger count") ggplotly(p1) ``` ### Ticket Prices ```{r} plot_tp <- ggplot(tc, aes(x = Pclass, y = Fare, colour = Pclass)) + geom_boxplot(outlier.shape = NA) + facet_wrap(~ age_group, ncol = 4) + theme_bw() + stat_summary(fun = "mean", geom = "point", size = 5, shape = 4, color = "black") + theme(legend.position = "none", plot.title = element_text(face = "bold"))+ labs(x = "Ticket Classes", y = "Ticket Fare") ggplotly(plot_tp) ``` ### Chart 4 ```{r} ``` Data Table ========================= ```{r} datatable(titanic, options = list(pageLength = 50)) ``` About ========================= *Reference* Dave Langer 2017, *Intro to Machine Learning with R & caret*, Data Science Dojo, Viewed 22 October 2021, https://www.youtube.com/watch?v=z8PRU46I3NY&t=1492s Kaggle 2021, *Titanic - Machine Learning from Disaster*, viewed 22 October 2021, https://www.kaggle.com/c/titanic/data?select=gender_submission.csv "Untergang der Titanic", By Willy Stöwer - Magazine Die Gartenlaube, en:Die Gartenlaube and de:Die Gartenlaube, Public Domain, https://commons.wikimedia.org/w/index.php?curid=97646